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Abstract: Artificial Neural Networks (ANN) demonstrate a compelling application of 
Al in predicting student performance, a critical aspect for both students and educators. 
Accurate forecasting of student achievements enables educators to monitor progress 
effectively, allowing educational institutions to optimize outcomes and improve 
student results. This study focuses on leveraging ANN for predictive analytics in 
student performance. Through a detailed evaluation of transfer functions, optimizers, 
learning rates, and momentum values, the model achieves an impressive 98% 
accuracy with specific configurations: a learning rate of 0.005, momentum of 0.7, 
Sigmoid transfer function, and SGD optimizer. Additionally, the study performs a 
comparative analysis of various Machine Learning Algorithms, including ANN, 
Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Decision Trees (DT), 
Naive Bayes (NB), and Logistic Regression (LR). Using data from 689 B.Tech 
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students at IP University, the analysis reveals that ANN outperforms other algorithms 
with an accuracy of 97%. This high accuracy demonstrates the potential of ANN in 
educational settings, providing a valuable tool for educators to enhance student 
performance and outcomes. 
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Introduction 

Technology is advancing rapidly, permeating both our 
daily lives and professional endeavors. This relentless 
progression has triggered notable transformations in the 
education sector, rooted in the fields of data mining and 
the application of artificial intelligence (AI). Predicting 
student performance is a very typical task in the 
education field, and different methods, algorithms, and 
approaches have been researched and applied to the use 
of machine learning (ML), educational data mining 
(EDM), and artificial neural networks to predict student 
performance for the best result (Abulhaija et al.,2023; 
Preetha and Anitha, 2022). Considering and extracting 
features have played important roles in developing 
models for student 


decision-making predicting 


performance (Zaffar et al., 2020). Learning management 
systems generate valuable quantitative insights through 


reports and learning data, enabling educators to analyze 
and improve course content and its delivery (Kuppusamy 
and Joseph, 2021). The emergence of global pandemics, 
such as COVID-19, has further accelerated the migration 
of education to digital platforms, alleviating concerns 
related to homework, tests, and attendance for young 
students. However, this shift has also reduced face-to- 
face interactions between educators and _ pupils, 
potentially impacting the seriousness with which online 
courses are taken. 

Addressing this challenge is imperative, as we must 
develop the capability to predict student performance 
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when instruction transitions away from online learning. 
Reliable predictions are crucial in preventing a significant 
the 
application of predictive analytics can benefit not only 
students but also professors, administrators, and the 


decline in student achievement. Ultimately, 


overall reputation of educational institutions. 

Students today employ intelligent devices to connect 
to wireless networks and access digital content, enabling 
them to engage in customized and uninterrupted learning 
This concept of smart education, 
characterizing learning in the digital era, has garnered 
increasing interest (Zhu et al., 2016). 


experiences. 


Teachers often lack real-time insights into students' 
actual performance, so they may resort to extrapolating 
their performance based on statistical data (Hamadneh et 
al., 2022). Utilizing an evolving composite model can 
evaluate the interrelation between the learning process, 
course components, and student performance (Jiao et al., 
2022). Through the use of artificial neural networks and 
data mining models, assessment metrics and key factors 
affecting student performance can be examined to 
determine the most effective approaches (Rodriguez- 
Hernandez et al., 2021). 

Initially, our focus lies in investigating and 
implementing an Artificial Neural Network (ANN). 
Ensuring that our model operates effectively and 
produces the desired results is crucial. Subsequently, we 
will compare various machine learning algorithms using 
our dataset to determine which yields the highest quality 
results. 

This investigation prompts several key questions: 

Q.1 How does the performance of a neural network 
compare to that of other classifiers? 

Q.2 What the factors that influence 
performance of an ANN in the context of education? 


are the 
Q.3 How can manipulating epochs, training, and 
testing sizes contribute to reducing errors in the ANN?". 


Artificial Neural Network (ANN) 

ANN is a combination of processing elements that are 
connected through a wire, and this connection is called 
neurons. These neurons have two layers: the first layer is 
the input layer, and the second layer is the output layer. 
In ANN all the neurons are connected to perform a 
specific task (Haloi et al., 2023; Venkata and Damodar, 
2023). In this, each neuron is called a node and each 
connection means the neuron-to-neuron connection is 
called edges. These edges have some weights that are 
multiplied by the input node. Summation of all inputs 
after weights and activation function sent to the output 
layer. Weights provide firmness to each neuron 
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connection. The activation function is a function that 
helps in providing a goal. 

Working of Multilayer Perceptron (MLP) 

Multilayer perceptron has been trained using both 
supervised and unsupervised learning methods. In 
supervised learning, training is to identify whether the 
selected object belongs to specified groups of predictors 
or not. MLP deals with both prediction and classification 
issues. MLP has three layers, first, A layer represents the 
input layer in which the predictor applies the input 
variable. These input variables multiply with weights that 
are passed to the second layer, i.e., the hidden layer 
performs some operations and maps with input data and 
the last layer is the output layer that produces the output. 
Some activation functions are also applied to the 
implementation of MLP. The predictor finds errors in the 
output by comparing the predicted output with the desired 
output. If a difference comes, it backpropagates the error 
to the model until we find the desired output. 

The Backpropagation Method (BP) 

Using Supervised learning in MLP has expanded the 
implementation of BP. BP occurs in two stages first one 
is the forward stage and another is the backward stage. In 
the forward propagation, the predictive weights of the 
MLP are evaluated and the input signals are sent through 
the layers until the desired output is achieved. In the 
second stage, backward propagation, the error signal is 
produced by comparing the MLP output to the expected 
output. This signal is propagated among the layers but in 
the backward direction. Through this, MLP can optimize 
the predictive weights and minimize the errors in each 
iteration until a certain accuracy is achieved. In the 
present study Gradient descent optimization function is 
used to minimize the error. In our research, we adjusted 
the learning rate in each cycle for the learning process 
and the activation function to achieve an accurate result 
(Rodriguez-Hernandez et al., 2021). 


Related work 

Previous research studies are important as they are the 
foundation for new research endeavors. My research idea 
is rooted in previous studies’ findings, which have 
contributed valuable insights to the field. While there are 
numerous relevant papers, Table 1 highlights some of the 
most crucial ones for reference. 

In this research, we employ a range of machine 
learning classifiers, including Support Vector Machines 
(SVM), Random Forest, Logistic 
Regression, Naive Bayes, and Artificial Neural Networks 
(ANN). Our objective is to examine previous research 
that has utilized these classifiers comprehensively. We 


Decision Trees, 
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analyze this extensive body of work from various angles, 
considering factors related to education, psychology, 
emotions and students’ backgrounds. 
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Table 1. Summarization of previous work. 
Author 
Musso et al., 2013 


Classifiers 
ANN 
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Outcome 


Divided students into two groups (greater 
than 33% and less than 33%) and achieved 
100% accurate categorization. 


Triventi, 2014 


Binomial Regression 


Analyzed the impact of working hours on 
working students’ study methods. 


Kyndt et al., 2015 


ANN 


Predicted the student's end-term 
performance after the first-year completion 
based on three approaches cognition, 


motivation, and learning. 


Mesarié, 2016 


Decision tree 


Classified students into different groups 
based on first-year results and teacher 
rankings with 79% accuracy. 


Zhu et al., 2016 


Framework of Smart Education 


Proposed a three-tier framework for Smart 
Education. 


Alves et al., 2017 


structural equation model 
(SEM) 


Findings: Explored factors affecting student 
performance, with family 
contributing significantly (90%). 


variables 


Ahmad and Shahzadi, ANN They were predicted student passing risks 

2018 with 95% training accuracy and 85% testing 
accuracy. 

Adekitan and Salau, Decision Tree, Random Forest, | Analyzed three years of grading data to 

2019 Naive Bayes, PNN, Tree predict final year results, with Logistic 


Ensemble, Logistic Regression 


Regression achieving 89.15% accuracy. 


Abu-Zohair, 2019 


NB, KNN, LDA, MLP, SVM 


Analyzed data for start-up universities and 
found LDA performed best with 79% 
accuracy. 


Vairachilai and 
Vamshidharreddy, 2020 


Decision Tree, Support Vector 
Machine (SVM), and Naive 
Bayes 


Analyzed data for start-up universities and 
found LDA performed best with 79% 
accuracy. 


Zhang et al., 2021 


Artificial Intelligence and 


Compared various AI and DM algorithms 


Educational Data Mining and identified Decision Tree and Logistic 
Algorithms Regression as effective for complex 
problems. 
Ahmad et al., 2021 ANN Predicted student results based on first 


semester scores with 93.20% accuracy 


Ghosh and Janan, 2021 


Random Forest Classifier 


Investigated reasons for student dropouts 
and achieved a 98.66% accuracy rate. 


Agarwal and Agarwal, Data mining Classifiers and Analyzed different classifiers and ANN 
2022 ANN models, with Decision Tree and Naive 
Bayes achieving the highest prediction 

accuracy. 
Decision Tree, K-Nearest Studied student learning patterns and 


Orji and Vassileva, 2022 


Neighbour, Random Forest, 
Logistic Regression, and 
Support Vector Machine 


achieved 94.9% accuracy with Random 
Forest. 


Yadav and Deshmukh, 
2022 


Artificial Intelligence and Data 
Mining classification 
algorithms 


Explored various classification and ANN 
algorithms, with accuracy varying based on 
attributes. 
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Wojciukc et al., 2022 CNN 


The research assesses the significance of 
the most 
effective ranges for these hyperparameters, 
and evaluates _ various 


hyperparameters, determines 


optimization 
techniques. 


Honghe Jin, 2022 
Algorithms 


Supervised Learning Machine 


The paper introduces a 
hyperparameter importance by analyzing the 


concept of 


variance of the risk function across different 


hyperparameter values. Additionally, it 
outlines a technique for estimating this 
importance through subsampling 
procedures. 

Liu et al., 2023 Reinforcement Learning The paper introduces an _ innovative 


approach to accelerate the training process 
of hyperparameter optimization (HPO) for 
machine learning algorithms, addressing the 
challenge of time and resource-intensive 
procedures. 


Chavez et al., 2023. 


They have predicted student exam outcomes 
without revealing student information, 


achieving 93.81% accuracy. 


Methods and Materials 
In our research, Figure 1 illustrates the framework we 

employed, comprising various stages. In a study by 
Carlos Felipe Rodriguez-Hernandez et al., they tested 
different parameters such as learning rate values (0.001, 
0.0005, 0.0001, 0.00005, 0.00001) and transfer functions 
for hidden and output layers (hyperbolic tangent, Linear 
sigmoid, Sigmoid and SoftMax), resulting in a high 
accuracy of 82% for the model. To further enhance the 
model's accuracy, we individually applied each of the 
three transfer functions (Sigmoid, ReLU. and Softmax) to 
both the hidden and output layers. We chose these 
functions because they are suitable for different types of 
tasks: Sigmoid for binary classification, ReLU for 
efficient processing in hidden layers, and Softmax for 
multi-class classification. Additionally, we adjusted the 
learning rate values by multiplying them by 5 (0.001, 
0.0001, 0.005, 0.0005) and the momentum value (ranging 
from 0.1 to 0.9) due to the achieved accuracy of our 
model. These adjustments enabled the model to make 
small weight updates, which is beneficial for fine-tuning 
the model or handling complex data patterns. Further 
details on these adjustments are provided below. 
Data Collection 

The data collection process unfolded in two distinct 
phases. Initially, we conducted a questionnaire survey 
involving 150 stakeholders to pinpoint pertinent 
attributes. Subsequently, we gathered data from 689 
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B.Tech students at IP University via Google Forms. 
These attributes were subsequently grouped into three 
categories: psychological, educational, and background 
traits, which exhibit interconnectedness. Background 
attributes encompass familial elements such as the 
number of siblings, parental income, educational 
achievement, and caste. Educational traits encompass 
data related to prior educational experiences, attendance, 
admission methods, scholarships, and 


language proficiency. Health factors are important for the 


assignments, 


physical and mental well-being of students. Parental 
relations signify whether the parents share a blood 
relation or not. Lastly, travel time indicates the duration 
of a student's commute. All the Attributes and their 
ranges are shown in Table 2. 
Statistical Analysis 

Table 2 represents the statistical examination of the 
attributes processed in this study. We computed each 
attribute's valid frequency, cumulative frequency, mean, 
standard deviation, variance, and p-value. It's noteworthy 
that no outliers were identified during the analysis. 
Data Preparation and Initialization 

In this, we prepare the data for data processing. We 
converted each attribute name from Al to A19, as shown 
in Table 2. We apply Formula 1 (multiplying each 
attribute with the certain weight wn and domain range fn) 
to calculate the attribute’s new domain range (A,). 

An= fa*Wa 2 2 (Formula 1) 
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Table 2. Attribute Statistical Description. 


Domain 


Valid 


Cumulative 


Standard 
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Attribute marks % ence Frequency Bercenael| Percastace Mean Bevindon Variance P-Value 
10th Marks > 33% 0 0 0 0 0.85 0.13 0.02 3.84E-04 
(A1) 33% - 40% 0.4 26 4 3.77 

41% - 50% 0.5 18 3 6.38 
51% - 60% 0.6 18 3 8.99 
61% - 70% 0.7 28 4 13.06 
71% - 80% 0.8 43 6 19.3 
81% - 100% 0.9 556 81 100 
Total 689 100 
12th Marks > 33% 0 0 0 0 0.83 0.13 0.018 2.42E-07 
(A2) 33% - 40% 0.4 22 3 3.193 
41% - 50% 0.5 33 5 7.98 
51% - 60% 0.6 29 4 12.19 
61% - 70% 0.7 26 4 15.96 
71% - 80% 0.8 50 7 23.22 
81% - 100% 0.9 529 77 100 
Total 689 100 
B.Tech Tyr > 33% 0 0 0 0 0.81 0.13 0.016 1.93E-16 
Marks 33% - 40% 0.4 24 3 3.48 
(A3) 41% - 50% 0.5 14 2 5.51 
51% - 60% 0.6 31 4 10.01 
61% - 70% 0.7 41 6 15.96 
71% - 80% 0.8 224 33 48.48 
81% - 100% 0.9 355 52 100 
(Total 1689 100 
Parents below 199999 0.4 269 39 39.04 0.59 0.14 0.019 3.73E-10 
Annual 200000<=599999 0.6 221 32 71.11 
Salary 600000<=1099999, 0.7 162 24 94.63 
(A4) 1100000<=1599999 | 0.8 24 3 98.11 
greater than 1600000 0.9 13 2 100 
Total 689 100 
Language Others 0.4 268 39 38.89 0.52 0.09 0.009 1.00E-18 
(A5) English 0.6 421 61 100 
Total 689 100 
Category General 0.4 324 47 47.02 0.53 0.13 0.016 4.64E-03 
(Caste) OBC 0.6 189 27 74.45 
(A6) SC & ST 0.7 176 26 100 
Total 689 100 
[Admission Management Quota 0.4 268 39 38.89 0.52 0.09 0.009 1.22E-09 
IMode Enterance 0.6 421 61 100 
(A7) Total 689 100 
|Attendance > 30% 0 0 0 0 0.76 0.17 10.03 2.09E-26 
(A8) 30% - 40% 0.4 89 13 12.91 
41% - 50% 0.6 111 16 29.02 
51% - 60% 0.7 71 10 39.33 
61% - 70% 0.8 30 4 43.68 
Above 70% 0.9 388 56 100 
Total 689 100 
Scholarship No 0.4 97 14 14.07 0.57 0.07 0.005 |6.70E-03 
(A9) Yes 0.6 592 86 100 
Total 689 100 
Gender Female 0.6 96 14 100 0.43 0.07 0.005 |1.12E-06 
(A10) Male 0.4 593 86 86.06 
Total 689 100 
IMother below 10 0 0 0 0 0.75 0.14 0.018  |1.06E-26 
[Education 10% 0.4 41 6 5.95 
(A11) 12th 0.6 144 21 26.85 
Graduation 0.8 337 49 75.76 
Post Graduation 0.9 167 24 100 
Total 689 100 
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(Father below 10 0 0 0 0 0.57 0.07 0.004 = |1.20E-05 
Education 10% 0.4 33 5 4.78 
(A12) 12 0.6 105 15 20.02 
Graduation 0.8 417 61 80.55 
Post Graduation 0.9 134 19 100 
Total 689 100 
Siblings Yes 0.4 297 43 43.1 0.57 0.07 0.005 |6.12E-16 
(A13) No 0.6 392 57 100 
Total 689 100 
|Assignment No 0.4 318 46 46.15 0.57 (0.07 10.005 4.20E-22 
(A14) Yes 0.6 371 54 100 
Total 689 100 
Mother's Job Other 0.4 49 7 TAL 0.57 (0.07 0.01 22.45E-06 
(A15) Home Maker 0.5 16 2 9.43 
Civil services 0.6 10 1 10.88 
Health care 0.7 28 4 14.94 
Business 0.8 365 53 67.92 
Teacher 0.9 221 32 100 
Total 689 100 
[Father's Job Other 0.4 46 7 6.67 0.57 (0.07 0.01 6.20E-05 
(A16) Home Maker 0.5 18 3 9.28 
Civil services 0.6 9 1 10.59 
Health care 0.7 27 4 14.51 
Business 0.8 374 54 68.79 
Teacher 0.9 215 31 100 
Total 689 100 
[Travel Time 15 mins - 30 mins 0.4 320 46 46.44 0.57 (0.07 10.004 6.31E-12 
(A17) 1 hour 0.6 199 29 75.32 
<lhour 0.7 170 25 100 
Total 689 100 
Health Issue Yes 0.4 435 63 63.13 10.47 (0.09 10.009 8.35E-13 
No 0.6 254 37 100 
Total 689 100 
Parents Status Divorced 0.4 322 47 46.73 0.50 0.10 10.009 B3.69E-04 
Living Together 0.6 367 53 100 
Total 689 100 


Data Preparation and Initialization 

In this, we prepare the data for data processing. We 
converted each attribute name from Al to A19, as shown 
in Table 2. We apply Formula 1 (multiplying each 
attribute with the certain weight wn and domain range fn) 
to calculate the attribute’s new domain range (A,). 


and Excellent) shown in Rule 1. In Rule 1 we divide the 
total into ranges, and according to the range, students 
divide into four categories. After applying all the 
formulas and rules, the dataset is shown in Figure 2. 


An= fp *Wn oe (Formula 1) 


After applying the formula on the attribute, we 
calculate the attribute range according to the below 
formulas: - 


A1=A1*1.5, A2 = A2*2, A3 = A3*4, A4 = A4*1.5, AS = A5*2.5, A6 = A6*1, A7 = 
A7*2.5, A8 = A8*3, A9 = A9*1.5, A10 = A10*3, 


All = A11*3, A12 = A12*1.5, A13=A13*3, A14=A14*3.5, A15=A15*1.5, A16 = 


A16*1, A17 = A17*2.5, A18 = A18*2, A19=A19*1 


These weights were finalized according to the 
importance of each attribute, which was calculated based 
on the stakeholders' answers. Then, we submit all these 
attributes and calculate the total, i.e, as shown in 
Formula 2. We analyzed the total and calculated the final 
performance into four categories (Poor, Sufficient, Good, 
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Total=YAA, -==s=s=-= Formula 2 
[(data.total =24) & (data.total<= 27), 'FinalGrade'] 
= 'Good' 

[(data.total = 22) & (data.total <24), 'FinalGrade'] = 
‘Satisfactory’ 
[(data.total > 19)& (data.total < 22) , 'FinalGrade'] 
= 'POOR' 
Rule 1 


Data Processing 

After data preparation and initialization, we evaluate 
the data for processing. Remove anomalies and fill or 
remove empty value rows. After this, we correlate each 
attribute to another attribute using the attribute elevating 
algorithm. We also calculated the feature correlation and 
feature importance score of the attributes shown in Fig 3. 
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Al A2 A3 AS AS AG AT AB AD AlO Al1 Al2 A13 Al4 AIS Al6 A17 A18 A19 FinalGrade 
0 135 1.8 36 060 100 04 10 12 09 12 27 120 18 14 135 0.8 1.00 08 04 Good 
1 135 18 36 090 100 06 10 2109 12 27 120 18 14 135 08 150 08 06 Good 
2 135 1.8 32 090 100 06 10 21 09 12 27 120 18 14 120 09 150 12 06 Good 
3 120 1.6 32 105 100 06 10 27 09 12 27 120 18 14 120 08 1.75 12 06 Excellent 
4 135 18 32 135 1.50 0.6 1.0 27 09 12 27 1.20 12 21 120 08 175 12 04 Excellent 
684 1.35 18 32 060 1.00 04 10 27 0.9 12 24 120 18 21 120 08 100 12 06 Good 
685 0.90 12 24 060 175 04 10 27 09 18 24 135 18 21 135 09 150 12 06 Good 
686 0.60 14 28 060 1.75 04 10 27 09 12 24 135 18 21 060 08 175 12 06 Good 
687 120 16 28 060 100 06 10 27 09 12 24 135 18 21 060 04 1.00 12 06 Good 
688 1.05 12 24 060 1.00 06 10 21 0.9 12 24 135 18 21 090 08 100 12 06 Good 
689 rows x 20 columns 

Figure 2. Dataset after applying the rules and formula. 
Model Implementation:- OutputTransferFunctions <- [Sigmoid,  Relu, 


The implementation and analysis of the model were 
carried out using Python tools. The implementation of 
artificial neural networks (ANNs) aimed to predict 
students’ academic performance through systematic 
training and data testing. The dataset was divided into 
distinct training and testing sets. Accuracy was computed 
for both the training and testing datasets. The pseudocode 
for the tuning process of the ANN model is given below- 
Pseudocode for the Tuning Process:- 

Procedure NeuralNetworkConfiguration(): 
// Neural Network Architecture Parameters 
InputLayerNodes <- 482 
HiddenLayerNodes <- 241 
OutputLayerNodes <- 2 
NumberOfHiddenLayers <- | 

// Training Parameters 

TotalEpochs <- 100 
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Softmax] 
LearningRates <- [0.001, 0.005, 0.0001, 0.0005] 
MomentumRange <- [0.1 to 0.9] 
OptimizationAlgorithms <- SGD 
// Neural Network Configuration Steps 
TInitializeNeuralNetwork(InputLayerNodes, 
HiddenLayerNodes, OutputLayerNodes) 
SetHiddenLayers(NumberOfHiddenLayers) 
ConfigureTrainingAndTesting(TotalEpochs) 
ConfigureOutputTransferFunctions(OutputTransferF 
unctions) 
SetLearningRates(LearningRates) 
SetMomentumRange(MomentumRange) 
ChooseOptimizationAlgorithm(OptimizationAlgorit 
hms) 
End Procedure 
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Features Correlation 
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FinalGrade 


- 0.25 
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Figure 3. Feature correlation. 


Pseudocode of the ANN Model 
Class NeuralNetwork: 
Constructor(X, y, size_hidden, eta, my, epochs, 

optimizer, verbose): 

Initialize samples, labels, wO1, w12, vO1, v12, g01, 
g12, bl, b2, eta, epochs, my, optimizer, and verbose 

Function sigmoid(x, deriv): 

If deriv is true, return x * (1 - x) 

Else, return 1 / (1 + exp(-x)) 

Function softmax(x, deriv): 

If deriv is true, calculate and return the partial 
derivative 

Else, calculate and return the softmax function 

Function relu(x, deriv): 

If deriv is true, return 1. * (x > 0) 

Else, return x * (x > 0) 

Function fit(): 

Initialize accuracy and no_epochs lists 

Initialize sample_no to 0 

If optimizer is "SGD", initialize gti_Ol and gti_12 
matrices 

For each epoch in range(epochs): 

For i in range(len(samples)): 
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Increment sample_no by 1 

Set 10 to the i-th sample 

Set y to the i-th label 

// Feed Forward Pass: 

Calculate 11 and 12 using relu and softmax activation 
functions 

Calculate 12_error and 12_error_total 

If 12_error_total is 1.0, return with an "Overflow" 
message 

// Backpropagation: 

Calculate 12_delta 

Calculate 11_delta 

// Update weights using SGD if the optimizer is 
"SGD" 

If optimizer is "SGD": 

Update weights using SGD 

If epoch is divisible by 1: 

If verbose is true, print epoch, error, and accuracy on 
the test and training sets 

Append accuracy to the accuracy list 

Append sample_no to the no_epochs list 

Function predict(test_samples, test_labels): 

Calculate 11 and 12 using relu and softmax activation 
functions 


Convert the predicted labels using argmax and 
checkEqual1 functions 

Return the predicted labels and true labels 

// For each eta in etas: 

For each eta in etas: 

// Create an instance of NeuralNetwork 

neural_net = NeuralNetwork(X, y, size_hidden, eta, 
my, epochs, optimizer, verbose) 

// Fit the model to the dataset 

neural_net.fitO 

// Plot accuracy and error 

Plot accuracy and erro 

// Predict and print accuracy 

predicted_labels, true_labels = 
neural_net.predict(test_samples, test_labels) 

Print accuracy 

// Print classification report 

Print classification report 
Pseudocode for Training the Individual Classifiers:- 

# Input: Preprocessed data, X_train, y_train 

# Output: Performance metrics of individual models, 
metrics 

def train_individual_models(X_train, y_train): 

# Define a list of models with their names 

models = [(KNN', KNeighborsClassifierQ), 

(‘MLP', MLPClassifier()), 

(SVC', SVM ClassifierQ), 

(‘GNB’, GaussianNBQ), 

(‘DT’, DecisionTreeClassifier()), 

(‘LR', LogisticRegressionClassifier()), 

(‘Random Forest’, RandomForestClassifier())] 

# Create an empty list to store the performance 
metrics of each model 

metrics = [] 

# Loop through each model in the collection of 
models 

for name, model in models: 

# Train the model using the preprocessed training data 

model.fit(X_train, y_train) 

# Make predictions on the preprocessed test data 

y_pred = model.predict(X_test) 

# Calculate various performance metrics 

precision = precision_score(y_test, y_pred) 


recall = recall_score(y_test, y_pred) 


accuracy = accuracy_score(y_test, y_pred) 

f1 = fl_score(y_test, y_pred) 

auc = roc_auc_score(y_test, y_pred) 

# Store the model name and _ its 
performance metrics in a dictionary 


associated 


model_metrics = { 
‘name’: name, 
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‘precision’: precision, 
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‘recall’: recall, 
‘accuracy’: accuracy, 


‘fl’: fl, 
‘auc’: auc 
} 


# Append the model's metrics dictionary to the list of 
metrics 

metrics.append(model_metrics) 

# Return the list of metrics containing performance 
information for each model 

return metrics 

In our research model implementation, the neural 
network featured one input layer with 482 nodes, one 
hidden layer with 241 nodes, and one output layer with 2 
nodes. The dataset was split into training and testing sets 
in a 70% to 30% ratio. . Each training and testing session 
consisted of 100 epochs. 

Additionally, we implemented other machine learning 
models, including Decision Tree, Naive Bayes, Support 
Vector Machine, K-Nearest Neighbor, Random Forest, 
and Logistic Regression, with the same 70% to 30% 
training-to-testing set ratio. 

The formulas for accuracy, F-score, recall, precision, 
and ROC curve are provided in Formula 3, where TP 
(True Positive), TN (True Negative), FN (False 
Negative), and FP (False Positive) are defined. 

Please note that the specific details of Formula 3 and 
other technical details would need to be included if they 
are relevant to the context. 


A 2 TP+TN 
CCULACY = 75+ EP+FN+TN 
Brecsion = 

recision TP AFP 
Recall = ee 
cae FP + EN 


2 * (precision * recall) 
F1_score = ——— 
(precision + recall) 


Formula 3 


Model Evaluation 

Model evaluation is segmented into two components. 
The initial segment presents the outcomes derived from 
assessing the ANN model through various combinations 
of learning rates, momentum values, transfer functions, 
and optimization algorithms, as detailed in section F.1. 
The subsequent segment F.2. 

involves comparing the results of various algorithms 
(Decision Tree, Naive Bayes, Logistic Regression, SVM, 
Random Forest, KNN, and ANN). 
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Figure 4. Traning and validation graph of accuracy and loss When (Function=Softmax, Ir=0.4) 
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Figure 5. Training and validation graph of accuracy and loss When (Function=Sigmoid, Ir=0.7) 


Results of the testing and training 

During this stage, it was noted that attaining a lower 
error 
performance, 


didn't necessarily lead to the best overall 
as evidenced by this analysis. After 
adjusting the learning rate and momentum values, we 
conducted a comprehensive assessment of 107 outcomes. 
In the case of the softmax function, utilizing a learning 
rate of 0.005 and a momentum value of 0.4 resulted in a 
lower accuracy, specifically 52%. Training and 
validation accuracy is shown in Fig 4. Conversely, when 
experimenting with the sigmoid, softmax and relu 
functions using different learning rates and momentum 
values, significantly higher accuracy was achieved with a 
momentum value of 0.7 and a learning rate of 0.005 for 
the sigmoid function, reaching 98%. Training and 
validation accuracy is shown in Fig 5. These findings 
collectively indicate that the sigmoid function excels in 
terms of achieving a lower error curve, higher accuracy, 
and quicker training and testing speeds compared to the 
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other functions utilized in fitting the model. All averaged 
weighted classification metrics are shown in Table 3. 
Results Comparison of the Model Evaluation 

Figures 6, 7, and 8 present a comprehensive 
comparative analysis of several algorithms, including 
Decision Tree, Naive Bayes, Logistic Regression, SVM, 
Random Forest, KNN, and ANN. Figure 6,7, and 8, 
focus on micro-averaged, macro-averaged, and weighted- 
averaged metrics, with a training-to-testing ratio set at 
70% to 30%. Among these algorithms, MLP achieved the 
highest accuracy at 96%, 
demonstrated the lowest accuracy at 89%. MLP exhibited 
the highest recall, precision, and Fl-score, establishing it 


while Decision Tree 


as the most effective predictor in this category. Figure 9 
provides insights into the ROC curve for all classifiers. In 
this representation, LR displayed a superior AUC value 
of 0.97. In contrast, the Decision Tree exhibited the 
lowest AUC value at 0.70. 
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Table 3. Weighted average classification report(Accuracy, F-score, Recall, Precision) 


Evaluation Metric Zz 
Hyperparameters of the model Accuracy | Precision | Recall FI Score 
Output function patie | Momentum | ACC | PREC | : | Fl 
0.1 0.92 0.86 0.93 0.89 
0.2 0.71 0.88 0.71 0.78 
0.3 0.96 0.93 0.96 0.95 
0.4 0.91 0.84 0.92 0.88 
0.001 0.5 0.91 0.83 0.91 0.87 
0.6 0.95 0.91 0.96 0.93 
0.7 0.94 0.9 0.95 0.92 
0.8 0.88 0.79 0.89 0.84 
0.9 0.93 0.87 0.93 0.9 
0.1 0.78 0.83 0.78 0.74 
0.2 0.90 0.95 0.9 0.92 
0.3 0.70 0.91 0.7 0.79 
0.4 0.52 0.75 0.52 0.36 
0.005 0.5 0.89 0.8 0.9 0.85 
0.6 0.72 0.92 0.73 0.81 
0.7 0.94 0.89 0.94 0.91 
0.8 0.85 0.73 0.85 0.79 
0.9 0.90 0.82 0.9 0.87 
Roe 0.1 0.94 0.88 0.94 0.91 
0.2 0.90 0.82 0.91 0.86 
0.3 0.93 0.88 0.94 0.9 
0.4 0.92 0.86 0.93 0.89 
0.0001 0.5 0.94 0.89 0.94 0.91 
0.6 0.90 0.82 0.91 0.86 
0.7 0.95 0.9 0.95 0.92 
0.8 0.92 0.85 0.92 0.88 
0.9 0.91 0.83 0.91 0.87 
0.1 0.90 0.82 0.9 0.86 
0.2 0.92 0.85 0.92 0.88 
0.3 0.95 0.9 0.95 0.93 
0.4 0.90 0.81 0.9 0.86 
0.0005 0.5 0.92 0.86 0.92 0.89 
0.6 0.93 0.96 0.94 0.95 
0.7 0.91 0.84 0.92 0.88 
0.8 0.94 0.9 0.95 0.92 
0.9 0.93 0.88 0.94 0.91 
0.1 0.92 0.85 0.92 0.88 
0.2 0.95 0.9 0.95 0.93 
0.3 0.94 0.87 0.93 0.9 
0.4 0.93 0.86 0.93 0.89 
0.001 0.5 0.95 0.9 0.95 0.92 
0.6 0.96 0.92 0.96 0.94 
0.7 0.91 0.84 0.91 0.88 
0.8 0.93 0.86 0.93 0.89 
0.9 0.95 0.9 0.95 0.92 
eats 0.1 0.93 0.87 0.93 0.9 
0.2 0.90 0.82 0.9 0.87 
0.3 0.94 0.89 0.94 0.91 
0.4 0.93 0.84 0.92 0.88 
0.005 0.5 0.92 0.86 0.92 0.89 
0.6 0.95 0.9 0.95 0.93 
0.7 0.93 0.87 0.93 0.9 
0.8 0.94 0.9 0.94 0.92 
0.9 0.92 0.88 0.92 0.9 
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0.1 0.95 0.9 0.95 0.92 

0.2 0.94 0.89 0.94 0.91 

0.3 0.93 0.87 0.93 0.9 

0.4 0.94 0.89 0.94 0.91 
oo 0.5 0.95 0.9 0.95 0.92 
0.6 0.91 0.84 0.91 0.88 

0.7 0.93 0.88 0.93 0.91 
0.8 0.90 0.82 0.9 0.86 

0.9 0.92 0.85 0.92 0.88 

0.1 0.93 0.88 0.93 0.91 

0.2 0.91 0.83 0.91 0.87 

0.3 0.94 0.89 0.94 0.91 

0.4 0.88 0.79 0.88 0.83 

ae 0.5 0.92 0.86 0.92 0.88 
0.6 0.94 0.89 0.94 0.91 

0.7 0.93 0.88 0.93 0.9 

0.8 0.95 0.9 0.95 0.93 

0.9 0.92 0.85 0.92 0.88 

0.1 0.95 0.9 0.95 0.93 

0.2 0.91 0.84 0.92 0.88 
0.3 0.90 0.82 0.9 0.86 

0.4 0.93 0.88 0.93 0.91 
0.001 0.5 0.90 0.82 0.9 0.86 
0.6 0.91 0.87 0.91 0.89 

0.7 0.93 0.88 0.93 0.91 

0.8 0.92 0.85 0.92 0.88 

0.9 0.91 0.84 0.92 0.88 

0.1 0.94 0.89 0.94 0.9 

0.2 0.93 0.88 0.93 0.91 

0.3 0.94 0.94 0.93 0.94 

0.4 0.95 0.9 0.95 0.93 

ae 0.5 0.91 0.84 0.92 0.88 
0.6 0.89 0.8 0.9 0.85 

Oy 0.98 0.94 0.98 0.95 

0.8 0.93 0.88 0.93 0.91 

Sigmoid 0.9 0.94 0.92 0.94 0.93 
0.1 0.95 0.9 0.95 0.93 

0.2 0.94 0.92 0.94 0.93 

0.3 0.90 0.82 0.9 0.86 

0.4 0.93 0.88 0.93 0.91 

0.0001 0.5 0.94 0.92 0.94 0.93 
0.6 0.92 0.85 0.92 0.88 

0.7 0.96 0.92 0.96 0.94 

0.8 0.92 0.85 0.92 0.88 

0.9 0.91 0.84 0.92 0.88 

0.1 0.90 0.82 0.91 0.86 

0.2 0.94 0.9 0.95 0.92 

0.3 0.95 0.9 0.95 0.93 

0.4 0.95 0.92 0.96 0.94 

0.0005 0.5 0.95 0.9 0.95 0.93 
0.6 0.93 0.88 0.93 0.91 

0.7 0.91 0.83 0.91 0.87 

0.8 0.91 0.83 0.91 0.87 

0.9 0.94 0.9 0.95 0.92 
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Micro-Averaged Metrics: 


Joss lesssbesesse po SSS See esse oes oee po SSS lla soon SSSSse Pisses SSS SSS SS esses Poss SeseSseSSesos rele + 

| Model | Precision (Micro) | Recall (Micro) | F1 Score (Micro) | Accuracy | 

4+--------------- 4+-------------------- 4-------------------- 4-------------------- 4-------------------- + 

| KNN | ©.9166666666666666 | 0.9166666666666666 | 9.9166666666666666 | 0.9166666666666666 | 

| MLP | @.9656862745098039 | 0.9656862745098039 | @.9656862745098039 | 9.9656862745098039 | 

| SVC | @.9166666666666666 | 0.9166666666666666 | 9.9166666666666666 | 9.9166666666666666 | 

| GNB | ©.9166666666666666 | 0.9166666666666666 | 0.9166666666666666 | 0.9166666666666666 | 

| DT | @.8970588235294118 | 0.8970588235294118 | @.8970588235294118 | 9.8970588235294118 | 

| LR | @.9509803921568627 | 0.9509803921568627 | @.9509803921568627 | 9.9509803921568627 | 

| Random Forest | @.9411764705882353 | 9.9411764705882353 | 0.9411764705882353 | @.9411764705882353 | 

Hoss Sst tte atest eet atecoecatece Fe ea a aa a cra et ass s5 SSS ST AEE haar + 
Figure 6. Micro averaged metrics of all classifiers 

Macro-Averaged Metrics: 

PSVSLew ERS e SESE SPSS eeewewe See apHis See Re See pee cee Ee ene | iaieatiealannacaieaammatieabecabaiet uate + 

| Model | Precision (Macro) | Recall (Macro) | F1 Score (Macro) | Accuracy | 

pHS SSeS eS e Sp HSS Se SE Se eee sete {pHs SSS SSO SSS See eee PES ses SESS SSS SSS SS = asisanibaaniomchacenaieeiomecieroaieatoainaaiechatoas : 

| KNN | @.9583333333333333 | @.5 | @.4782608695652174 | 0.9166666666666666 | 

| MLP | @.9154135338345865 | 0.8475935828877006 | 0.8778129545649012 | 0.9656862745098039 | 

| SVC | @.9583333333333333 | @.5 | @.4782608695652174 | 0.9166666666666666 | 

| GNB | @.7361111111111112 | @.820855614973262 | @.7695221638864891 | 9.9166666666666666 | 

| DT | @.6675627240143369 | 0.6764705882352942 | 0.6718498659517427 | 0.8970588235294118 | 

| LR | @.9746192893401016 | @.7058823529411764 | 0.7786458333333333 | @.9509803921568627 | 

| Random Forest | @.9698492462311558 | @.6470588235294118 | @.71172868582195 | 0.9411764705882353 | 

$o-5-5S-55-5S55- s esteaariariehetieteattritrhatatetiatetiatal S spaeacaciedriatenietesieetetedetaetatetated $o=s--ss 5555 5555-==- $--Ss sss 5-SsSSeSescs + 
Figure 7. Macro averaged metrics of all classifiers 

Weighted-Averaged Metrics: 

4+--------------- 4---------------------- +-------------------- 4+--------------------- 4-------------------- + 

| Model | Precision (Weighted) | Recall (Weighted) | F1 Score (Weighted) | Accuracy | 

a ae ee a Sa aa aaa a Poste rissa ees SRS SSS See + 

| KNN | ©.923611111111111 | 0.9166666666666666 | 0.8768115942928986 | @.9166666666666666 | 

| MLP | @.9639724310776944 | @.9656862745098039 | 0.9641624597130715 | 9.9656862745098039 | 

| SVC | ©.923611111111111 | 0.9166666666666666 | 0.87681159429028986 | @.9166666666666666 | 

| GNB | @.9328703703703703 | 0.9166666666666666 | 9.922985755743116 | 9.9166666666666666 | 

| DT | @.8997909199522104 | @.8970588235294118 | @.8983914299115282 | @.8970588235294118 | 

| LR | @.9534686971235194 | @.9509803921568627 | @.9414962499999999 | @.9509803921568627 | 

| Random Forest | @.9447236180904522 | @.9411764705882353 | @.9260480452190296 | @.9411764705882353 | 

4+--------------- 4+---------------------- 4+-------------------- 4--------------------- 4-------------------- + 


Figure 8. Weighted averaged metrics of all classifiers 
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Figure 9. AUC values of all Classifiers 
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Conclusion 

To understand the factors that influence Artificial 
Neural Networks (ANN) 
education, our first objective involved categorizing 
elements into three distinct groups: background qualities, 


in the context of smart 


educational attributes, and _ psychological _ traits. 
Background characteristics, encompassing familial 
details such as the number of siblings, parents’ 


educational levels, income, employment status, and 
gender, were identified as influential factors impacting 
ANN performance, particularly for higher-performance 
groups benefiting from enhanced educational support. 
Educational attributes, including academic performance 
in the 10th, 12th, and B.Tech. First-year examinations, 
attendance, and assignment performance were found to 
have the most substantial influence on student outcomes. 
Concurrently, psychological attributes, considering 
students’ mental and physical health, were recognized as 
pivotal, acknowledging the correlation between overall 
success and good health. These factors collectively 
contributed to the discernible impact on the ANN's 
performance in the realm of smart education, leading to 
the categorization of students based on these influential 
factors. 

Moving on to our second objective, which centered on 
minimizing the ANN error curve, we focused on the 
careful selection of hyperparameters. Adjusting 
parameters such as epoch size, training size, testing size, 
momentum value, and learning rate within the 
appropriate range was deemed crucial to avoid local 
minima, reduce training and testing times, and optimize 
performance. Modifying hyperparameter values was 
essential for achieving the best performance and the 
shortest error curve in the smart education context.For 
our third objective, which involved the performance 
comparison of classifiers, we divided all classifiers into 
training and_ testing sets, allocating 70% 
and 30% of the data, respectively, as per the specified 
model evaluation section. Our findings unveiled that the 
ANN exhibited a remarkable accuracy rate of 97% in 
predicting student achievement, surpassing — the 
performance of the Decision Tree classifier, which 
achieved an accuracy of 89%. Notably, the Multilayer 
Perceptron (MLP) outperformed all other classifiers in 
terms of recall, precision, and F-score values, reinforcing 
its efficacy in the smart education domain. 


Limitations 
This research paper has provided insights into 
artificial neural networks (ANNs), diverse classifiers, 


transfer functions, and optimization techniques. 
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Nonetheless, certain limitations are evident, such as the 
relatively small dataset comprising only 689 students. 
Additionally, like 
interactions, academic engagement, and interpersonal 
skills have been omitted despite their potential influence 


certain factors students' social 


on academic performance. These limitations will be 
thoroughly investigated and addressed in future research 
endeavors aimed at enhancing the accuracy of student 
performance prediction. 


Future Work 

Future research will prioritize including education- 
related variables and utilizing all relevant factors to 
enhance prediction accuracy. Our forthcoming models 
will consider all constraints outlined in the preceding 
section. We will explore diverse topologies, network 
transfer functions, and 


configurations, parameters, 


optimization techniques to refine 


capabilities further. 


our predictive 
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